Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test(entities): improve docs, doctests, and typing #1034

Closed
wants to merge 39 commits into from

Conversation

bonjourmauko
Copy link
Member

@bonjourmauko bonjourmauko commented Aug 23, 2021

Part of #1061
Superseded by #1220
Superseded by #1252
Superseded by #1255

Documentation

  • Completes the documentation of the entities module

Doctests

  • Completes doctest coverage of the entities module

Typing

  • Completes type annotations of the entities module
  • Consolidates types into typing, under three categories:
    • types: for data-types, which are subsequently split into two sub-categories:
      • XType: an actual data-type —for example an array
      • XLike: an object where the actual data-type is irrelevant, but that can be coerced to X.
    • protocols: for behaviours —think interfaces but for the sole purpose of static type-checking
    • schemas: type-safe data-objects —mostly for dictionaries

Deprecations

  • Entity.set_tax_benefit_system: now provided by a standard property setter.
  • Entity.check_role_validity: moved to helpers as it had nothing to do with entities

New features

  • Renders the relationship entity 1 - * variable declarative, thanks to the _VariableProxy descriptor
  • Adds dataclass and slots for reduced boilerplate and improved performance

Technical changes

  • In GroupEntity, replaces dynamic role attributes assignment for an explicit caching mechanism
    • Allows for slots and thus the performance gains
    • Fixes arbitrary mutability by making attributes declarative

Notes

  1. This PR makes an architectural choice of strong domain model / logic separation, following what has already begun with for example the split of Entity and Population, Simulation and SimulationBuilder, and so on. Although the changset is backward-compatible, future contributions enforcing this choice may introduce breaking changes.
  2. This PR introduces the domain-driven design notion of adapter, proxy, port, or repo: beyond the pure relationship between models, the behavioural logic between them is extracted to specialised models —in this case, querying variables is extracted from Entity to a _VariableProxy. This is just a subset of 1.
  3. This PR consolidates the notion of protocols, or duck-typing, or structural sub-typing: that means type-checks are done not against the actual implementation of a model, but against a protocol, that is, the equivalent of an interface, or an abstract model, but without impact at runtime.
  4. Finally, and more importantly, this PR introduces the notion of schemas. But, contrary to regular schemas used to validate, serialise, and deserialise data from/to models, these schemas are purely type-safe data objects, meant for type-checks, without impact at runtime.

Extended note on data-schemas

What if, instead of just using schemas as typed-dicts for type-checks, we used them at runtime to compulsorily:

a. check types at runtime?
b. validate data at runtime?
c. serialise/deserialise data —as suggested in #1071 ?
d. enforce contracts, or domain logic —both in terms of data ("has to be >0") and representation ("tojson()")?

For
  • Explicit declaration and documentation of the two most important data models of OpenFisca, today dynamic and implicit: Formula and Test.
  • Explicit declaration and documentation of interface contracts and data transmutation —this is already, partially, done with OpenAPI
  • Complete separation of actual model and representational logic —extracting json logic from TaxBenefitSystem, Holder, and so on!
  • Composability: once all data/logic are split, and data/representation through explicit schemas, then OpenFisca would become fully-modular. That is, users can use drop-in replacements tailored to their needs, as long as those replacements respect the interface contracts: data in messagepack, models in rust, a calculation engine nodejs, etc.
Against

Performance: whereas declarative data transmutation can even increase performance for individual situations —WebAPI—, a naive or out-of-the-box implementation (or anything with a complexity of at least O(n)) will certainly have a very penalising impact on performance for large population simulations.

This due to several reasons:

  • Economists already validate data before running a simulation — @benjello can you confirm?
  • numpy already provides data type-casting, with configurable levels of safety, and C-optimised
  • Other that I cannot think of right now 🤣

Possible workarounds:

  • make it opt-in with a flag —for example an optional WebAPI flag --safe or --strict
  • implement validations, serialisation, etc., vectorially from the outset —literally extending the numpy C-API and vendoring C-extensions, pre-compiled or not

@bonjourmauko bonjourmauko added the kind:refactor Refactoring and code cleanup label Aug 23, 2021
@bonjourmauko bonjourmauko requested a review from a team August 23, 2021 19:46
Copy link
Contributor

@HAEKADI HAEKADI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @maukoquiroga :)

openfisca_core/commons/rates.py Outdated Show resolved Hide resolved
openfisca_core/commons/rates.py Outdated Show resolved Hide resolved
openfisca_core/commons/tests/test_formulas.py Outdated Show resolved Hide resolved
openfisca_core/commons/tests/test_dummy.py Outdated Show resolved Hide resolved
openfisca_core/commons/tests/test_formulas.py Outdated Show resolved Hide resolved
openfisca_core/commons/tests/test_formulas.py Outdated Show resolved Hide resolved
openfisca_core/entities/helpers.py Show resolved Hide resolved
openfisca_core/entities/role.py Show resolved Hide resolved
openfisca_core/entities/tests/test_helpers.py Outdated Show resolved Hide resolved
openfisca_core/entities/tests/test_entity.py Outdated Show resolved Hide resolved
@MattiSG
Copy link
Member

MattiSG commented Aug 24, 2021

Thanks @maukoquiroga.

This is marked as depending on #1033 but I see redundant changes, for example in the config file.

I would be in favour of having a single PR as they seem to address the same issue (openfisca/openfisca-doc#244) and to improve documentation altogether.
If you'd prefer not to for some reason, could we at least update the target branch to #1033? 🙂 The current changesets seem likely to yield conflicts.

@bonjourmauko bonjourmauko marked this pull request as draft August 24, 2021 10:32
@bonjourmauko
Copy link
Member Author

bonjourmauko commented Aug 30, 2021

This is marked as depending on #1033 but I see redundant changes, for example in the config file.

Corrected it to depend on #1038 to depend on both #1038 and #1033, and changed base for clarity as suggested , the redundant changes are effectively just in the setup and config files, the rest has to be properly rebased as they're independent from #1033 (I will do shortly).

If you'd prefer not to for some reason, could we at least update the target branch to #1033? 🙂 The current changesets seem likely to yield conflicts.

I will rebase properly, that should get us rid of the conflicts 😃

I would be in favour of having a single PR as they seem to address the same issue (openfisca/openfisca-doc#244) and to improve documentation altogether.

I thought so initially as well, and I tried, then I decided to split this work by "submodule" and add doc as well as fixing the doctests. Here's what happened:

  • I tried to fix all doctests at once, which I couldn't for several reasons:

    • Some parts of the code are just un-unit-testable without actual refactoring (creating a Holder for example involves almost half of the codebase, including the Holder itself…)
    • Then I tried decoupling some classes to use dependency injection —even if that means breaking changes— as so to make them more granular an unit-testable, but the actual test suite do not cover all scenarios, catch-22…
    • In a last effort to try to understand what I was doing I turned myself over to the documentation, which is excellent already, but still incomplete and outdated, because not checked regularly and automatically, catch-22…
  • Then I adopted the current attack strategy so to:

    • Fix/complete the doctests of the more independent/uncoupled modules, to avoid (yet) refactoring
    • Fix/complete the documentation to discover refactoring opportunities (the todos)
    • To set a standard that can be applied partially but progressively over time
    • To increase the confidence on the tests enough as to do the needed refactoring and breaking changes

I think we probably won't be able to fix all the doctests soon, that's why I changed from an horizontal to a vertical (submodules) approach.

Happy to have an extended discussion on that with the community if you think that could be useful!

@bonjourmauko bonjourmauko changed the base branch from master to doctests/commons August 31, 2021 11:00
@bonjourmauko bonjourmauko changed the title Improve openfisca-core's entities module doctests Improve entities module doctests Aug 31, 2021
@bonjourmauko bonjourmauko force-pushed the doctests/commons branch 2 times, most recently from 36a5929 to 799ada9 Compare August 31, 2021 12:52
@bonjourmauko bonjourmauko marked this pull request as ready for review September 7, 2021 17:48
@bonjourmauko bonjourmauko force-pushed the doctests/entities branch 2 times, most recently from 183fca2 to 1fbfa33 Compare September 9, 2021 11:23
@bonjourmauko bonjourmauko force-pushed the doctests/entities branch 5 times, most recently from 9bc6af7 to d773990 Compare September 11, 2021 15:04
Copy link
Contributor

@HAEKADI HAEKADI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @maukoquiroga! Great contribution 👏
I ran all the tests and they passed :)

I don't think I know enough to approve the structural changes though 😅

openfisca_core/entities/tests/test_variable_proxy.py Outdated Show resolved Hide resolved
openfisca_core/entities/tests/test_variable_proxy.py Outdated Show resolved Hide resolved
openfisca_core/entities/__init__.py Outdated Show resolved Hide resolved
openfisca_core/entities/__init__.py Outdated Show resolved Hide resolved
openfisca_core/entities/_variable_proxy.py Outdated Show resolved Hide resolved
Co-authored-by: Hajar AIT EL KADI <[email protected]>
@bonjourmauko
Copy link
Member Author

@MattiSG @sandcha @benjello ?

@benjello
Copy link
Member

benjello commented Nov 8, 2021

I am in favor of more typing, more doctest but I am ,ot sure I can grasp all the implications of the changes ...

Copy link
Member

@MattiSG MattiSG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The added documentation is great!
However, I don't understand the consequences of introducing an abstract base class system and this “variable proxy” system, so I cannot approve that PR 😕

openfisca_core/commons/__init__.py Show resolved Hide resolved
openfisca_core/entities/tests/test_entity.py Outdated Show resolved Hide resolved
openfisca_core/entities/tests/test_group_entity.py Outdated Show resolved Hide resolved
openfisca_core/entities/tests/test_role.py Outdated Show resolved Hide resolved
Co-authored-by: Matti Schneider <[email protected]>
@bonjourmauko
Copy link
Member Author

The added documentation is great! However, I don't understand the consequences of introducing an abstract base class system

Thanks @MattiSG !

Concerning the abstract class system, it is not one. As I've described it:

This PR consolidates the notion of protocols, or duck-typing, or structural sub-typing: that means type-checks are done not against the actual implementation of a model, but against a protocol, that is, the equivalent of an interface, or an abstract model, but without impact at runtime.

Structural sub-typing is described in PEP 544.

Quoting it:

At runtime, protocol classes will be simple ABCs. There is no intent to provide sophisticated runtime instance and class checks against protocol classes. (...) Protocols are completely optional:

  • No runtime semantics will be imposed for variables or parameters annotated with a protocol class.
  • Any checks will be performed only by third-party type checkers and other tools.

In other words, there are no consequences beyond type-checks —no runtime impact then.

To make the point, an Entity is to an EntityProtocol the same a List is to a Sequence: instead of type-checking for implementation, we type-check for behaviour (that is why it is nicknames "duck-typing").

This has, however, one major benefit: we remove circular module dependencies from the codebase, which has an actual positive impact in terms of modularity and testability.

and this “variable proxy” system, so I cannot approve that PR 😕

That is just an abstraction, or in DDD, a port.

As I see it has two impacts:

  • It adds explicitness —we encapsulate the relationship between entities and variables and make it explicit
  • It adds modularity —again it helps decoupling models and making a clear distinction between data/logic

Beyond the "design" aspect of it, the actual recipe, the descriptor, is just standard Python —a huge deal of the function system, properties, and so on, are just syntactic sugar; behind-the-scenes they're just descriptors.

Finally, as you can see from the changeset, the introduced syntax is more coherent with the projector system used everywhere in OpenFisca, which should, sooner rather than later, IMHO, be reimplemented as a descriptor as well.

@bonjourmauko bonjourmauko added kind:test Adding missing tests or correcting existing tests and removed kind:refactor Refactoring and code cleanup labels Oct 1, 2024
@bonjourmauko bonjourmauko changed the title [3/17] Improve entities module docs, doctests, and typing test(entities): improve docs, doctests, and typing Oct 1, 2024
@bonjourmauko bonjourmauko self-assigned this Oct 1, 2024
@bonjourmauko
Copy link
Member Author

Superseded by #1255

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:test Adding missing tests or correcting existing tests
Projects
Status: Abandoned
Development

Successfully merging this pull request may close these issues.

4 participants